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Abstract. Separation is a classical problem in mathematics and computer 
science. It asks whether, given two sets belonging to some class, it is possible 
to separate them by another set of a smaller class. We present and discuss the 
separation problem for regular languages. We then give a direct polynomial 
time algorithm to check whether two given regular languages are separable by 
a piecewise testable language, that is, whether a 3Ei(<) sentence can witness 
that the languages are indeed disjoint. The proof is a reformulation and a 
refinement of an algebraic argument already given by Almeida and the second 
author. 



1. Introduction 

The separation problem. Separation is a classical question in mathematics and 
computer science. In general, one says that two sets X, Y are separable by a set 
U if X QU and Y nU = 2). In this case, U is called a separator. 

The separation problem is the following. Consider a class C of sets or structures, 
and a subclass Co of C. The problem asks whether two elements X,Y oi Q can 
always be separated by an element of the subclass Co- A classical example of 
such a separation problem, with a positive answer, is the Hahn-Banach separation 
theorem. Another example that appeared recently in computer science is the proof 
of Leroux [14] of the decidability of the reachability problem for vector addition 
systems (or Petri Nets), which greatly simplifies the original proof by Mayr [15], 
and that of Kosaraju [12]. Namely, Leroux has shown that non-reachability can be 
witnessed by a class of recursively enumerable separators: from a configuration ci 
of such a system, one cannot reach a configuration C2 if and only if the sets {ci} and 
{02} can be separated by a Presburger definable set, which in addition is invariant 
under actions of the vector addition system. Since such sets form a recursively 
enumerable class, this yields a semi-algorithm for checking non-reachability. 

In the case where elements of C cannot always be separated by an element of 
Co, several natural questions arise: 

(1) given elements X,Y in C, can we decide whether a separator exists in Co? 

(2) if so, what is the complexity of this decision problem? 

(3) can we in addition compute a separator, and what is the complexity? 

In this context, it is known for example that separation of two context-free 
languages by a regular one is undecidable [11]. 
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In this paper, we look at the separation problem for the class C of regular 
languages, and we are looking for separators in smaller classes, such as prefix- 
or suffix-testable languages, locally trivial languages, and piecewise testable lan- 
guages (we will define these classes below). 

The profinite approach. Several results from the literature can be combined 
into an algorithm answering question (1), for all classes we are interested in. Sev- 
eral partial complexity results can also be derived from this approach, which 
we briefly explain now. This approach relies on a generic connection found by 
Almeida [2] between profinite semigroup theory and the separation problem, when 
the separators are required to belong to a given variety of regular languages. 

A variety V of regular languages associates to each finite alphabet A a class of 
languages A*V, with some closure properties (namely closure under Boolean oper- 
ations, left and right residuals L i-^ a~^L and L t-^ La~^, and inverse morphisms 
between free monoids). All classes of separators in this paper belong to a variety 
of regular languages. 

Almeida [2] has shown that two regular languages over A are separable by a 
language of A*V if and only if the topological closures of these two languages inside 
a profinite semigroup, depending only on V, intersect. To turn this property into 
an algorithm, we have therefore to be able: 

— to compute representations of these topological closures, and 

— to test for emptiness of intersections of such closures. 

So far, these problems have no generic answer. They have been studied for a 
small number of specific varieties, in an algebraic context. Deciding whether the 
closures of two regular languages intersect is equivalent to computing the so-called 
2-pointlike sets of a finite semigroup wrt. the variety we are interested in, see [2]. 
This question has been answered positively, in particular for the following varieties: 

i) languages recognized by a finite group [5, 17, 6], 
a) star-free (that is, FO-definable) languages [10, 9], 

iii) piecewise testable (that is, 'Bi;i(<)-definable) languages [4, 3], 

iv) languages whose syntactic semigroups are 3?-trivial, that is, languages whose 
minimal automaton is very weak (the only cycles allowed in the graph of the 
automaton are self-loops) [3], 

v) languages for which membership can be tested by inspecting prefixes and 

suffixes up to some length (folklore, see [1, Sec. 3.7]), 
vi) locally testable languages, that is, languages for which membership can be 
tested by inspecting prefix, suffix and factors up to some length [20, 16]. 

For all these classes, proofs use algebraic or topological arguments. In this 
paper, we obtain direct polynomial time algorithms for Cases iii) and v). Our 
intuition is strongly lead by the proof techniques from profinite semigroup theory. 

A general issue is that the topological closures cannot be described with a finite 
device. However, for piecewise testable languages, the approach of [4] consists in 
computing an automaton over an extended alphabet, which recognizes the closure 
of the original language. This can be performed in polynomial time wrt. the size 
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of the original automaton. Since these automata admit the usual construction 
for intersection, and can be checked for emptiness in NLOGSPACE, we get a 
polynomial time algorithm wrt. the size of the original automata. The construction 
was presented for deterministic automata but also works for nondeterministic ones. 
One should mention that the extended alphabet is 2^ (where A is the original 
alphabet). Therefore, these results give an algorithm which, from two NFAs, 
decides separability by piecewise testable languages in time polynomial in the 
number of states of the NFAs and exponential in the size of the original alphabet. 

The improvement of the separation result for piecewise testable languages as 
presented in this paper is twofold: on the one hand, the algorithm presented 
provides better complexity as it runs in polynomial time in both the size of the 
automata, and in the size of the alphabet. On the other hand, our results do 
not make use of the theory of profinite semigroups, that is, we work only with 
elementary concepts. The proof follows however basically the same pattern as the 
original one. 

The key argument is to show that non-separability is witnessed by both au- 
tomata admitting a path of the same shape. In our proof, we manually extract 
from two non-separable automata some paths with this property, using Simon's 
factorization forest Theorem [19]. Whereas in the profinite world, these witnesses 
are immediately obtained by a standard compactness argument. 

Organization of the paper. After having recalled the background in Section 2, 
we present in Section 3 a simple toy example, to highlight the main definitions 
and techniques: the case of separation by prefix-testable languages. Section 4 
is devoted to the question of separation by piecewise testable languages. The 
main algorithm and proofs are given in this section. For the interested reader, we 
provide some elements of profinite semigroup theory in appendix. 



2. Preliminaries 

Given a finite alphabet A, we denote by A* (resp. by A~^) the free monoid 
(resp. the free semigroup) over A. For a word u £ A*, the smallest B A such that 
u € B* is called the alphabet of u and is denoted by alph(M). A nondeterministic 
finite automaton (NFA) over A is denoted by a tuple A = {Q, A, I, F, 6), where Q 
is the set of states, I ^ Q the set of initial states, F Q the set of final states 
and 6 Q Q X Q the transition relation. If (5 is a function, then yi is a deterministic 
automaton (DFA). We denote by L{A) the language of words accepted by A. 
Given a word u £ A*, a subset B oi A and two states p, q of A, we denote 

— by p — ^ q a path from state p to state q labeled u. 

— hy p q a path from p to g of which all transitions are labeled by letters 
oiB. 

=B 

— hy p > q a path from p to q which all transitions are labeled by letters of 

B, with the additional demand that every letter of B occurs at least once along 
this path. 
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Given a state p, we denote by scc{p, A) the strongly connected component of p in A 
(that is, the set of states reachable fromp), and by alph_scc(p, ^1) the set of labels of 
all transitions occurring in this strongly connected component. Finally, we define 

def 

the restriction of yi to a subalphabet i? C j4 by yi \b = {Q,A,I,F,6n{QxBxQ)). 

3. A TOY example: separation by prefix-testable languages 

A regular language L is a prefix-testable language if membership of L can be 
tested by inspecting prefixes up to some length, that is, if L is a finite Boolean 
combination of languages of the form uA*, for a finite word u. Prefix-testable 
languages form a variety of regular languages. Therefore, as recalled in the intro- 
duction, it follows by [2] that testing whether two given languages can be separated 
by a prefix-testable language amounts to checking that their topological closures 
in some profinite semigroup have a nonempty intersection. 

It turns out that for prefix-testable languages, this profinite semigroup is easy to 
describe (see [1, Sec. 3.7]): it is A~^UA°°, where A°° denotes the set of right infinite 
words over A. Multiplication in this semigroup is defined as follows: infinite words 
are left zeros {vw = f if w € A°°), and multiplication on the left by a finite word is 
the usual multiplication: (oi ■ ■ ■ a„)(5i ■ ■ ■ ) = ai • • • a^fti • • • . Finally, the topology 
is the product topology: a sequence converges 

— to a finite word u if it is ultimately equal to u, 

— to an infinite word v if for every finite prefix x ol v, the sequence ultimately 
belongs to x{A+ U 

Therefore, from a given NFA A, one can compute a Btichi automaton recogniz- 
ing the language of infinite words that belong to the closure of L(A), as follows: 

(1) Trim A, by removing all states from which one cannot reach a final state. This 
can be performed in linear time wrt. the size of A, and does not change the 
language recognized by A. 

(2) Build the Btichi automaton obtained from the resulting trim automaton by 
declaring all states accepting. 

This yields a straightforward PTIME (actually NLOGSPACE) algorithm to 
decide separability by a prefix-testable language: first check that L{Ai)riL{A2) = 
0. If so, compute the intersection of the languages of infinite words belonging 
to the closures of L{Ai) and L(A2) by the usual product construction, and check 
that this Biichi automaton accepts at least one word. 

Proposition 1. One can decide in PTIME whether two languages can be separated 
by a prefix-testable language. □ 

4. A SIMPLE PTIME ALGORITHM FOR SEPARATION BY A PIECEWISE TESTABLE 

LANGUAGE 

Piecewise testable languages. Let <l be the scattered subword ordering de- 
fined on A* as follows: for u,v G A*, we have u <\ v if u = ai • • • a„ and 
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V = vqUiVi ■ ■ ■ Vn-iCLnVn, with tti £ A and Vi E A* . We let 

Sub„,(u) = {w £ A* : \w\ ^ n^w <\ u}. 

When two words have the same scattered subwords up to length n, we say that 
they are ^n-equivalent: 

Sub„(u) = Sub„(i;) <;=^ u ~„ v. 

A regular language over an alphabet A is piecewise testable (PT) [18] if it is a 
finite Boolean combination of languages of the form A*aiA*a2 ■ ■ ■ A*anA* ^ where 
every a, £ A. Whether a given word belongs to a PT-language is thus determined 
by the set of its scattered subwords up to a certain length. In other words, a 
regular language L is piecewise testable if and only if there exists an n € N such 
that L is a union of classes. 

The class of piecewise testable languages has been extensively studied during 
the last decades. It corresponds to languages that can be defined in the frag- 
ment 23Si(<) of first-order logic on finite words. Simon has shown that piece- 
wise testable languages are exactly those languages whose syntactic monoid is 
0-trivial [18], and this property yields a decision procedure to check whether a 
language is piecewise testable. Stern has refined this procedure into a polynomial 
time algorithm [21], whose complexity has been improved by Trahtman [22]. 

Separation by a piecewise testable language. We say that two regular lan- 
guages Li, L2 are PT-separable if there exists a piecewise testable language L that 
separates them, i.e., 

Li C L and L2 n L = 0. 

In other words, Li and L2 are PT-separable if there exists a 'BSi(<) formula 
which is satisfied by all words of Li, and not satisfied by any word of L2. 

Our main contribution is a simple proof of the following result, which states 
that one can decide in polynomial time whether two languages are PT-separable. 

Theorem 1. Given two NFAs, one can determine in polynomial time, with re- 
spect to the number of states and the size of the alphabet, whether the languages 
recognized by these NFAs are PT-separable. 

Note that a language is PT-separable from its complement if and only if it is 
piecewise testable itself. Therefore, applying Theorem 1 to a language and to its 
complement if they are both given by NFAs yields a polynomial time algorithm 
to check if a language is piecewise testable. We recover in particular the following 
result, proved by Stern [21] using the characterization for minimal automata recog- 
nizing PT-languages as given by Simon in [18] (this result has later been improved 
by Trahtman [22]). 

Corollary 1. One can decide in polynomial time whether a given DFA recognizes 
a piecewise testable language. 

The rest of this section is devoted to the proof of Theorem 1. We fix a DFA 
A over A. For uq, . . . ,Up € A* and nonemptv subalphabets Bi, . . . , Bp C A, let 
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u = (no, . . . , Up) and B = {Bi, . . . , Bp). We call such a pair {u, B) a factorization 
pattern. A {H, B)-path in A is a successful path (leading from the initial state to 
a final state of ^1) , of the form 




Figure 1. A (n, S)-path 



Recall that edges denote sequences of transitions: an edge labeled C B denotes 
a path of which all transitions are labeled by letters of B. An edge labeled = 
B denotes a path of which all transitions are labeled by letters of B, with the 
additional demand that every letter of B occurs at least once. 

Remark 1. The automaton A admits a {u, B)-path if and only if L{A) contains 
a language of the form 

uo{xiylzi)ui ■ ■ ■ Up^i{xpy*pZp)up, 

where alph{xi) U alph{zi) C alph{yi) = Bi. 

Theorem 1 directly follows from the next two statements. 

Proposition 2. Let Ai and A2 be two NFAs. Then, L{Ai) and L{A2) are not 
PT-separahle if and only if there exist u = {uq, . . . , Up) and B = (i?i, . . . , Bp) such 
that both Ai and A2 both have a (H, B)-path. 

Proposition 3. Given two NFAs, one can determine in polynomial time, with 
respect to the number of states and the size of the alphabet, whether there exist 
u = {uq, . . . , Un) and B = [Bi, . . . , Bn) such that both NFAs admit a {u, B)-path. 

As observed above, the characterization of PT-separable languages given in 
Proposition 2 can be applied to the minimal automata of a regular language and 
of its complement, to obtain a characterization for minimal automata recognizing 
PT-languages. It turns out that with this approach, we retrieve exactly the same 
characterization as given by Simon in [18]. 

Let us first prove Proposition 3. 

Proof of Prop. 3. We will first show that the following problem is in PTIME: given 
states pi, gi, ri in automaton Ai and p2, 12,^2 in automaton A2, determine whether 
there exists a nonempty alphabet i? C A such that there is an (= i3)-loop around 
both qi and 52, and (C i?)-paths from pi to ri via qi in Ai, and from p2 to r2 via 
q2 in A2, as pictured in Figure 2. 
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Figure 2. Finding a common pattern in Ai and A2 

To do so, we compute a decreasing sequence (Cj)j of alphabets over-approximating 
the maximal alphabet B labeling the loops. Note that if there exists such an al- 
phabet B, it should be contained in 

def 

Ci = alph_scc((7i,yii) n alph_scc(g2,^2)- 

Using Tarjan's algorithm to compute strongly connected components in linear 
time [8], one can compute Ci in linear time as well. Then, we restrict the automata 
to alphabet Ci, and we repeat the process to obtain the sequence (Cj)^: 

Ci+i =^alph_scc(gi,yii fcj n alph_scc(g2,'A2 fcj- 

After a finite number n of iterations, we obtain Cn = Cn+i- Note that n ^ 
|alph(yii) nalph(yi2)| ^ |^|. If C„ = 0, then there exists no nonempty B for which 
there is an (= i?)-loop around both p and q. If C„ ^ 0, then it is the maximal 
nonempty alphabet B such that there are (= -B)-loops around qi in Ai and (72 in 

A2 ■ It then remains to determine whether there exist paths pi — > qi ri and 

P2 ^ — > q2 ?"2i which can be performed in linear time. 

To sum up, since the number n of iterations to compute C„ = Cn+i is bounded 
by l^l, and since each computation is linear wrt. the size of Ai and yi2, deciding 
whether there is a pattern as in Figure 2 in both Ai and yi2 can be done in 
polynomial time wrt. to both \A\ and the size of the NFAs. 

Now we build from Ai and A2 two new automata Ai and A2 as follows. The 
procedure first initializes Ai as a copy of Ai. Denote by Qi the state set of Ai. 
For each 4-uple r = (pi , ri , p2 j ^^2 ) £ Qi x Q2 such that there exist an alphabet 

B, two states qi G Qi,q2 £ Q2 and paths pi ^— > qi > qi ^— t- both for 

i = 1 and i = 2, we add in both Ai and A2 a new letter a-r to the alphabet, and 
transitions pi ri and p2 — > ^2. Since there is a polynomial number of tuples 
{pi,qi,ri,p2, q2,r2), the above shows that computing these new transitions can be 
performed in polynomial time. Therefore, computing yii and A2 can be done in 
PTIME. 

Now by construction, there exists some factorization pattern (u, B) such that 
Ai and yi2 both have a (u, i?)-path if and only if L{Ai) H L{Ai) 7^ 0. Since both 
Ai and Ai have been built in PTIME, this can be decided in polynomial time. □ 



8 L. VAN ROOIJEN AND M. ZEITOUN 

As a side remark, let us mention that it is crucial that the ( = i?)-paths, which 
are required to use exactly the same alphabets, are actually loops (occurring in 
Figure 2 around states qi and 52)- The next statement shows that even for DFAs, 
the problem is NP-hard if we are looking for paths labeled by a common alphabet, 
without requesting these paths to be loops. The proof is deferred to the Appendix. 

Lemma 1. The following problem is NP-complete: 

Input: An alphabet A = {ai, 02, . . . , a„} and two DFA 's Ai,A2 over A. 
Question: Do there exist u G L{Ai) and v € L{A2) such that alph{u) = alph(v)? 

Let us now prove Proposition 2. Let us first prove the "if" direction. The "only 
if" direction is proved in Lemma 6. 

Lemma 2. // two NFAs Ai and A2 share a common {u, B) path, then the lan- 
guages L{Ai) and L{A2) are not PT-separahle. 

Proof. Let L be a piecewise testable language such that L{Ai) C L. Using the 
hypothesis and Remark 1, this implies that L contains a language 

UQ{xiylzi)ui ■ ■ ■ Up-i{xpy*Zp)up, 

where alph(xi) U alph(zi) C alph(yi) = Bi. Similarly, L{A2) contains a language 
uo{x[y[* z[)ui ■ ■ ■ Up_i{x'py'* z'p)up, where alph(x^) U alph(2^) C a\ph{y'-) = Bi. We 
will show that for every n, there is an element in this language which is 
equivalent to an element of uo{xiylzi)ui ■ ■ ■ Up-i{xpypZp)up, using the following 
claim. 

Claim 1. Given x,x' ,y,y' , z, z' G A* that satisfy 

alph{x) U alph{z) C alph{y), 
alph{x') U alph{z') C alph{y') = alph{y), 

then for every n E N, 

xy^'z ~„ x'y"^z'. 

Indeed, from the inclusions 

alph(y)^- = Sub„(y-) C Sub„(xy"z) C alph(y)^", 

it follows that Sub„(xy"z) = alph(y)'*". In the same way, Suhn{x'y''^z') = 
alph(y')'*", which is equal to alph(y)'*". Thus xy'^z ~„ x'y'"'z'. This establishes 
the claim. 

Applying this, we obtain that Xiyfzi ~„ x[y'l''z[ for every i. Since is a 
congruence, we obtain for all n € N: 

UQ{xiyiZi)ui ■ ■ ■ Up-i{xpypZp)up uo{x[y'{'z[)ui ■ ■ ■ Up-iix^y'^ Zp)up. 

Since L is piecewise testable, it is a union of ^^-equivalence classes for some 
thus it cannot be disjoint from L(yi2). □ 

To prove the other direction of Proposition 2, we introduce some notation. For 
i? C A, let us denote by the set of words with alphabet exactly B: 

B® = {w eB* \ a\ph{w) = B}. 
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Given a factorization pattern (n, B), with u = (uq, . . . , Up) and B = {Bi, . . . , Bp), 
we let 

L{u,B,n) = no(i?f • • • 

We say that a sequence {wn)n is (u, B)- adequate if 

Vn ^0, G L{u,B,n). 

A sequence is cahed adequate if it is (n, i?)-adequate for some factorization pat- 
tern {u,B). 

Lemma 3. Every sequence {wn)n of words admits an adequate subsequence. 

Proof. We use Simon's Factorization Forest Theorem, which we recah. See [19, 13, 
7] for proofs and extensions of this theorem. A factorization tree of a nonempty 
word X is a finite ordered unranked tree T(x) whose nodes are labeled by nonempty 
words, and such that: 

— all leaves of T(x) are labeled by letters, 

— all internal nodes of T(x) have at least 2 children, 

— if a node labeled y has k children labeled yi, . . . ,yk from left to right, then 

y = yi---yk- 

Given a semigroup morphism cp : A'^ — > S into a finite semigroup S, such a 
factorization tree is ip-Ramseyan if every internal node has either 2 children, or k 
children labeled yi, . . . ,yk, in which case ip maps all words yi, . . . ,yk to the same 
idempotent of S. Simon's Factorization Forest Theorem states that every word 
has a 99-Ramseyan factorization tree of height at most 3\S\. 

Let now {wn)n be a sequence of words. We use Simon's Factorization Forest 
Theorem with the morphism alph : — ?> 2^. 

Consider a sequence (T{wn))n, where T{wn) is an alph-Ramseyan tree given 
by the Factorization Forest Theorem. In particular, T{wn) has depth at most 
3 • 2'"^!. Therefore, extracting a subsequence if necessary, one may assume that the 
sequence of depths of the trees T{wn) is a constant H. We argue by induction on 
H. If H = 0, then every Wn is a letter. Hence, one may extract from {wn)n a 
constant subsequence, which is therefore adequate. 

We denote the arity of the root of T{wn) by arity(w„), and we call it the arity 
of Wn. If H > 0, two cases may arise: 

(1) One can extract from {wn)n a subsequence of bounded arity. Therefore, one 
may extract from Wn a subsequence of constant arity, say K. This implies 
that each Wn has a factorization in K factors 

Wn = Wn,l ■ ■■Wn,K, 

where Wn,i is the label of the i-th. child of the root in T(wn). Therefore, the alph- 
Ramseyan subtree of each Wn,i is of height at most H — 1. By induction, one 
can extract from {wn,i)n an adequate subsequence. Proceeding iteratively for 
i = 1, 2, . . . /C, one extracts from {wn)n a subsequence {w^(n))n such that every 
(^o-(n),j)n is adequate. But a finite product of adequate sequences is obviously 
adequate. Therefore, the subsequence (ifo-(n))n of {wn)n is also adequate. 
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(2) The arity of Wn grows to infinity. Therefore, extracting if necessary, one can 
assume for every n, arity(?i;n) ^ max(n,3). Since ah arities of words in the 
sequence are at least 3, ah children of the root map to the same idempotent 
in 2^ . But this says that each word from the subsequence is of the form 

Wa{n) = Wn,l ' ' ' Wn,K„, 

with Kn ^ n, and where the alphabet of Wn,i is the same for all i, say B. 
Therefore, tt;^(„) G [B®)^" C (B®)"'. Therefore, (W(r(n))n is adequate. □ 

We now say that a factorization pattern (u, B) is proper if 

(1) for all i, last(iij) ^ Bi and first (uj) ^ -B^-i, 

(2) for ah i^ui = e^ {Bi^i ^ Bi and Bi ^ Bi^i) . 

Note that if a sequence {wn)n is adequate, then there exists a proper factoriza- 
tion pattern (n, B) such that {wn)n is {u, i?)-adequate. This is easily seen from 
the following observations and their symmetric counterparts: 

u = ui ■ ■ ■ Uk and Uk £ B =^ ui ■ ■ ■ u^B^ C ui ■ ■ ■ Uk^iB"", 
Bi^i QBi Bl'_^B^ C B^. 

The following lemma gives a condition under which two sequences share a fac- 
torization pattern. This lemma is very similar to [1, Theorem 8.2.6]. 

Lemma 4. Let (n, B) and {t, C) he proper factorization patterns. Let {vn)n md 
{wn)n be two sequences of words such that 

— {vn)n is {u, B)- adequate 

— {wn)n is {t,C)- adequate 

— Vn ~n Wn for cvcry n ^ 0. 
Then, u = t and B = C. 

Proof For a factorization pattern {il,B), we define 

||K^)|| :=(^K|)+p, 
i=0 

where p is the length of the vector u. Let 

fc:=max(||(n,S)||,||(r,C)||) + L 

Consider the first word of the sequence {vn)n, ^-c, vq = uobiui ■ ■ ■ bpUp, where 
alph(6i) = Bi. Define 

vH'^ := uobiUi . . . bpUp. 
Recall that {vn)n being a {u, i?)-adequate sequence means that 

Vneuo{Bfrui---Up.i{B®rup 

for every n. Thus, we have for every k ■ max(|6i|, . . . , [6„|) that Vq^'' < v^. Note 
that whenever i' ^ max(^, I^q'^'*!), we have that Wq'^-* G Sub^'(w£/). And, using the 
assumption that Vn ~n Wn for all n, this gives that Vq <\ we>. In the same way, for 
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Wo = toCiti ■ ■ ■ Cqtq, we obtain an index m such that for every m ^ max(m, \wq |), 

(k) (k) 

both Wq < Wm' and Wq <\ v^' hold. 

Let M := max(£', m'). Then Vq^^ < vm,wm and z^q^^^ < tiM, u^M- 

Now fix a factor 6^ of Uq^''. In particular, 6^ < tt^A/- Since k > \\{t,C)\\ and 
\bi\ > 0, the pigeonhole principle gives that there is some Cj with alph(6j) C Cj. 

Exploiting this, we want to define a bijection between the set of indexed alpha- 
bets in B and the set of those in C that will help us to show that (u, B) = (t, C). 

Let B := {(Si, 1), . . . , (Sp,p)} and C := {(Ci, 1), . . . , (Cg, g)}. We define a 
function / : B C, by sending {Bi,i) to that {Cj,j) for which dj G {Cj®)^ 

(k) 

is the first factor of wm used to fully read bi, while reading f q as a scattered 
subword of wm ■ 

The function (7 : C — )• B is defined analogously. The functions / and g preserve 
the order of the indices and pointwise preserve the alphabet. If we show that / and 
g define a bijective correspondence between B and C, then p = q- The fact that 
/ and g pointwise preserve the alphabet would then imply that Bi = Ci, for every i. 

To establish that / and g are each others inverses, we apply Lemma 8.2.5 from 
[1], which we shall first repeat: 

Lemma 5 ([1, Lemma 8.2.5]). Let X and Y be finite sets and let P be a partially 
ordered set. Let f : X ^ Y, g : Y ^ X,p : X ^ P and q : Y ^ P be functions 
such that 

(1) for any x G X,p{x) ^ q{f{x)), 

(2) for any y G Y,q{y) ^ p{g{y)), 

(3) i/xi,X2 G X,f{xi) = f{x2) andp{xi) = q{f{xi)), then xi = X2, 

(4) ifyi,y2 G Y,g{yi) = g{y2) and q{yi) =p{g{yi)), then yi = y2. 

Then f and g are mutually inverse functions and p = qo f and q = po g. 

The functions / and g fulfill the conditions of this lemma, if we let X = B, y = 
C, let P be the set of alphabets, partially ordered by inclusion, and let p and q 
be the projections onto the first coordinate: 

(1) and (2) hold since / and g pointwise preserve the alphabet. Suppose that 
f{Bi-^) = f{Bi^) and that Bi^ = f{Bi^). This means that a factor bi^ and a factor 

6^2 of Uq'^'* are read inside the same factor c'^ of wm- Thus alph(6j^Ui^ '"^12) ^ 
alph(cj) = f{Bi-^) = Bi-^ = alph(6ij). But we assumed that {u,B) is a proper 
factorization pattern, so ii must be equal to i2- This shows that (3) holds, and (4) 
is proved similarly. 

It follows that indeed / and g define a bijective correspondence between B 
and C, thus p = q and Bi = Ci, for every i. Since we are dealing with proper 

(k) 

factorization patterns, Vq <\ wm now implies that ti <] Ui, for every i. On the 

(k) 

other hand, Wq < vm now implies that Ui < ti, for every i. Thus, for every i, 

Ui = ti. □ 

Now we are equipped to prove the "only if" direction of Proposition 2. 
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Lemma 6. // the languages recognized by two DFAs Ai and A2 are not PT- 
separable, then Ai and A2 share a common (u, B)-path. 

Proof. By hypothesis, for every n € N, there exist Vn € L{Ai) and Wn € L{A2) 
such that 

(1) Vn ~n Wn- 

This defines an infinite sequence of pairs {vn,Wn)n, from which we will iteratively 
extract infinite subsequences to obtain additional properties, while keeping (1). 

By Lemma 3, one can extract from {vn,Wn)n a subsequence whose first compo- 
nent forms an adequate sequence. From this subsequence of pairs, using Lemma 3 
again, we extract a subsequence whose second component is also adequate (note 
that the first component remains adequate). Therefore, one can assume that both 
{vn)n and (tfn)n are themselves adequate. 

Lemma 4 shows that one can choose the same factorization pattern (u, B) such 
that both {vn)n and {wn)n are (u, i3)-adequate. Finally, by the following claim, 
we then obtain that both Ai and A2 admit a (u, i?)-path. 

Claim 2. If L{A) contains a (u, B)- adequate sequence, then A admits a {u,B)- 
path. 

Indeed, L{A) contains a (n, i?)-adequate sequence {vn)ni i-e. 

Vn ^ 0, t;„ G uoiBf^ui ■ ■ ■ np_i(S®)"up n L{A). 

Let Vn be a sufficiently large term in this sequence, e.g. with n > |Q(yi)|. Now 
the path used to read v„ in A must traverse loops labeled by each of the Bi 's and 
clearly, by the shape of Vn, this is a (u, i?)-path. □ 
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Appendix A. Connection with profinite semigroup theory: overview 

We show that separabihty of two languages by a V-recognizable language is 
equivalent to the nonemptiness of the intersection of their closures in the free pro- 
V semigroup. This was already shown in [2]. The material of Section A.l can be 
found in [1]. 

A.l. Background. Fix a finite alphabet A and a pseudovariety V. A semigroup T 
separates u,v (z if there exists a morphism 99 : A'^ T such that 7^ ^(v)- 
Given u,v € A'^, let r\/{u,v) = min||r| : T G V and T separates u and G 
NU {00}. Assume for simplicity that two distinct words can be separated by some 
semigroup of V. Then d\/{u,v) = 2~^'^^'^''"\ with 2~°° = 0, defines a metric on 
A sequence {un)n is Cauchy for this metric if for every morphism 99 : A'^ — > T, 
{ip{un))n is eventually constant. Let (fi^V, dy) be the completion of the metric 
space {A'^ , dy). By construction, A'^ is dense in fi^V. Pointwise multiplication of 
classes of Cauchy sequences transfers the semigroup structure of to I^aV, on 
which the multiplication is continuous. 

Proposition 4. (JIaV, dy) is compact. 

Proof. One checks that every sequence {un)n of elements of Jl^V has a converging 
subsequence, that is, since is complete, a Cauchy subsequence. Since A+ 

is dense in fi^V, one can find a word Vn such that \imnd\/{un,Vn) = 0. This 
reduces the statement to the case where u„ is a word. Now, since there is a 
finite number of morphisms from A^ into a semigroup of size at most k, one 
can extract by diagonalization a subsequence {u'^)n of such that for any 

morphism if : — )■ T with \T\ ^ k, ((/3(u^))„^fc is constant. □ 

Endow T G V with the discrete topology. The definition of makes every 
morphism 93 : A~^ — )■ T G V uniformly continuous. Since A'^ is dense in fi^V 
compact, (f has a unique continuous extension f : — )• T (which by continuity 
of the multiplication is also a morphism). For L C O^V, denote by cl(L) its 
topological closure in Q^V. 

Lemma 7. Let : A+ ^ T £ V and K = f~^{P) for P CT. Then d{K) = 
f-\P). 

Proof. Unions commute with inverse images and closures, so it suffices to treat 
the case P = {p}. Since f is continuous, is clopen, and it contains K, so 

c\{K) C (p~^(jp). Conversely, for u G Lp~^{p), pick a word n„ such that dy{u,Un) < 
2~" (which exists since A'^ is dense in Vtj^). Then (p{un) = p for n > |r[, hence 
Un e K, so u e cl{K). □ 

For K CA+,we let K'' = A+ \ K and cl(i^)^ = O^V \ cl{K). 

Corollary 2. (1) If K is \/ -recognizable, then cl(i^'^) = cl(K)^. 

(2) If K is \/ -recognizable and L C A^ is such that cl{L) C cl{K), then L Q K. 
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Proof. (1) Let : ^ T e V, with K = ip'^^iP). By Lemma 7, cl(K^) = 
(^-i(r\P) =nAV\^-i(P) = d{Ky. For (2), just write Lniv:^ C cl(i^:)ncl(K^) = 
by (1). □ 

Proposition 5 (follows from [1, Thm. 3.6.1]). Closures of V -recognizable lan- 
guages form a basis of the topology of fi^V. 

Proof. By Lemma 7, the closure of a V-recognizable language is of the form ip~^{P) 
for some continuous morphism : Q^V — >■ T E V, hence it is open. Conversely, 
for u G ^^aV, let 0„ = a~^{a{u)), where a is the product of all morphisms 
If : A'^ — > T G V for \T\ ^ n. By Lemma 7, is the closure of the V-recognizable 
language a~^{a{u)). By construction, 0„ is an open containing u, contained in 
the ball of radius 2~" centered at u. □ 

A. 2. Separability of languages by a V-recognizable language. Two lan- 
guages Li,L2 C are \/ -separable if there exists a V-recognizable language K 
such that Li K and K r\ L2 = %■ Such a language is a witness, in the given 
variety of languages, that Li n L2 = 0, and we say that it separates Li and L2. 

Proposition 6. Two languages of are separated by a \/ -recognizable language 
if and only if the intersection of their topological closures in is empty. 

Proof. Let Li,L2 Q A'^ , and let K be V-recognizable such that Li K and 
-fC n L2 = 0. Then cl(Li) n cl(L2) C cl{K) n cl{K^) = by Corollary 2. 

Conversely, if cl(Li) n cl(L2) = 0, then any u G cl(Li) belongs to the open 
set cl(L2)'^, so by Proposition 5, there exists some V-recognizable language Ku 
whose closure Ou contains u, and is such that Ou ncl(L2) = 0- Therefore cl(Li) C 
Uneci(Li) Since cl(Li) is a closed set in the compact space fi^V (Prop. 4), it 
is itself compact and has a finite cover 0^ U • • • U Ou„ ■ Then K = K^^ U • • • U Ku„ 
is V-recognizable. We have cl(Li) C cl(i^), so by Corollary 2, Li C K. Also, 
K COu,U---UOu„C cl(L2)^ □ 

Appendix B. Proof of Lemma 1 
Lemma 1. The following problem is NP-complete. 

Input: An alphabet A = {oi, 02, . . . , a„} and two DFA 's Ai,A2 over A. 
Question: Do there exist u G L{Ai) and v G L{A2) such that alph{u) = alph{v)? 

Proof. We will give a reduction from 3-SAT to this problem. 

Let 93 be a 3-SAT formula over the variables {xi, . . . , Define A := {xi, . . . x„, - 
Let Ai be 





and let ^12 be the serial automaton in which for every disjunct d in the i-th 
clause of ip, there is an arrow from state i to i + 1 labeled d, concatenated with 
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a copy of Ai. For example, if ip = {xi V X3 V -1X4) A ... A (X4 V -1X5 V X2), the 
automaton A2 is 




We will show that ip is satisfiable if and only if the question mentioned above 
is answered positively for these Ai and A2- 

Suppose (p is satisfiable. Then there is a valuation v : {xi, . . . x„} — > {0, 1} such 
that v{(p) = 1. Define u := yi - ■ ■ yn, with yi = Xi if v{xi) = 1 and yi = -'Xi if 
v{xi) = 0. In each of the k clauses of if, there is at least one disjunct d for which 
v{d) = 1. Define v := wi ■ ■ ■ WkU, where Wi is any one of the disjuncts in the i-th 
clause that is evaluated to 1. Now, u £ L{Ai),v G L(A2), and by soundness of 
the valuation function, alph(u) = alph(u). 

On the other hand, suppose that for these Ai and A2, there are u G L{Ai), v G 
L{A2) with alph(u) = alph(f ). By construction of Ai, for every i, alph(u) contains 
either Xi or -iXj. By construction of A2 and since alph(tt) = alph(?;), we have that 
V = wu and that alph(w) C alph(ti). Define the valuation 

v: {xi,...Xn} {0,1} 

Xi 1 if Xi £ alph(u) 

Xi !->■ else 

Now V sends all variables occurring in li; to 1, which gives v{ip) = 1. 

□ 
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